Monte Carlo Dropout for Predicting Prices with Deep Learning and Tensorflow Applied to the IBEX35

Abstract

This paper studies the implementation of the dropout method for predicting returns in Ibex 35's historical constituents. This methodology produces multiple predictions for the same input data, thus allowing to obtain a standard deviation and a mean for the predictions. Using 100 predictions and a filter based on the standard deviations, some models could generate returns in the test set whereas the first individual prediction of each model lost money during the same period. These results illustrate the usefulness of including uncertainty in predictions.

In addition, a custom metric was defined for training the models. It is defined to mirror the Sharpe ratio given that standard metrics do not completely reflect reality for deep learning models applied in finance, given the asymmetry in the returns.

Finally, the models are compared using different ratios to compare the returns adjusted for risk, being the simple recurrent neural network model, the worst performing one in the test set. The LSTM and the GRU with the strictest filter obtained the best results for the ratios considered. The convolutional 1 D layer performed better than the simple recurrent neural network model.

1. Introduction

Deep learning has shown great advances in multiple fields. In this paper, it is used to make predictions on the next day returns on the constituents of the Ibex 35. One of the great capabilities of deep learning is the automated extraction of features. In that sense, four different models are compared for making predictions using the percentage change of the open, close, high, low and the standard deviation, kurtosis, and skewness of the returns over the last 22 days.

These models always produce a prediction, even when there might be high uncertainty in this prediction. Thus, by computing multiple predictions for each model, it is possible to average the results, and compute the standard deviations to model the uncertainty in this prediction. In this paper it is achieved by randomly deactivating a percentage of the connections in the neural networks, based on GG (Gal and Ghahramani, 2016). Finally, these models are compared with the buy and hold on the Ibex 35 on the test set and with a single prediction of each model.

2. Literature review

There is sounding literature studying the application of deep learning for predicting stocks returns. Using monthly data and multiple input variables obtained from Refinitiv, different neural networks structures are capable of predicting returns using MSE as the loss function and dropout layers for regularizing the great amount of parameters with the few data points available (Abe and Nakayama, 2018).

A comparison of different deep learning models and other machine learning algorithms yields the long short-term memory (LSTM) as the best structure for predicting stock market returns for multiple forecast windows (Nabipour et al., 2020), when evaluated for four regression losses, amongst them, the MSE. In addition, another study on the constituents of the S&P500 shows the LSTM cells as the best performing structure for predicting stock returns (Fischer and Krauss, 2018). Moreover, when using only a few stocks, an LSTM model performed better than a 1D Convolutional layer in terms of MSE.

Nevertheless other studies show the prediction power of a convolutional layer when applied to a few stocks (Sayavong, Wu and Chalita, 2019). In addition, using daily returns from the Chinese stock market, the convolutional layer shows some predictive capacity (Chen and He, 2018).

As for the dropout technique for estimating uncertainty, other authors have proposed different methodologies such as the variational dropout (Molchanov, Ashukha and Vetrov, 2017) or the single shot MC dropout approximation (Brach, Sick and Dürr, 2020).

For comparison the sharpe ratio (Sharpe, 1994), sortino ratio (Sortino and Price, 1994) and the information ratio (Goodwin, 1998) are computed for each model in the test set.

3. Methodology and analysis of results

In this part, we start writing the required code. First of all we import the required libraries.

The versions of the different libraries are showed to facilitate reproducibility.

In the next few steps, four neural networks predicting a stock's daily returns are compared. These models are composed of two layers, each one followed by a batch normalization layer (Ioffe and Szegedy, 2015) and a dropout layer (Baldi and Sadowski, n.d.). After that there is fully connected layer and a final fully connected layer which outputs the prediction. The two initial layers have 16 units each, while the next fully connected layer has eight neurons, and the last layer uses only one neuron. The activation function is the hyperbolic tangent for every layer except the last one which uses an exponential linear unit activation (Clevert, Unterthiner and Hochreiter, 2016) to reflect the asymmetry in stocks returns, as the negative return cannot be lower than –100% but the positive returns are not limited. Finally, the dropout is set to 50%.

The four models differ in the type of layer used for the initial two layers. These chosen layers are two simple recurrent neural networks cells, two gated recurrent unit (GRU), two LSTM layers and two one dimension convolutional layers, given its advantages and uses for time series data (Kiranyaz et al., 2021), using a kernel size of five for the convolutional layers.

To create these four models we define the following functions. The training parameter activates the dropout layer during prediction too.

We want to use the data for the constituents of the Ibex 35 from 1995 to 2020. At the start of each year, those companies listed in the index are added to the study from that day onwards.

We will set the key for the API, this key is obtained using the Eikon APP to generate keys. Note that an instance of Workspace or Eikon needs to run on your machine for the cel below to work.

The dates defined for the study are stored in the variable dates.

We define the required fields for our study, we define a dataframe that will be filled with the constituents of the Ibex 35. In addition, we declare the constituents of the Ibex 35. We need at least one field to obtain the data, in this case the GICS sector was chosen.

Then we fill this dataframe.

We already have a dataframe containing the historical consitituents of the Ibex 35 and the dates when they were listed in the index. We want to obtain another dataframe containing the daily returns of each company since the date they were already part of the Ibex 35. In addition, given that the get_timeseries function of eikon only returns 3000 data points, we take only 10 years with each connection. As, the high, low, close and volume information were not available until the market closed we shift that data one day. Moreover, we compute the daily percentage of change of the open price and create additional columns computing rolling statistical measures of these changes. Finally we create another column containing the next day percentage of change of the open, which will be used for training the neural networks.

There are a few companies for which data is not available, thus we ignore those stocks.

The obtained data is transformed to numpy and transformed to be between 0 and 1 in the training set. In addition, it is divided into the training set from the start of 1995 to the end of 2011, the validation set from the start of 2012 to the end 2015 and finally the test set, which goes from the start of 2016 to the end of 2020.

We will create sequences of data of 22 days that will be used as input for the neural networks.

We prepare the data, and store it as tensorflow datasets.

A custom loss is defined by attempting to mirror the sharpe ratio assuming 0% risk free returns, 0.25% trading commissions, and adding a little constant to the denominator to avoid division by zero.

In addition, we want to stop the training if the loss of this model does not improve at least 0.01 for 15 consecutive epochs in the validation set or 7 consecutive epochs in the training set, restoring the weights that achieved the lowest loss

Finally, we want to schedule the learning rate during training. It will increase for the first 10 epochs and after that, the learning rate starts to decrease using an expontential function.

Finally, these models are trained using the RMSprop optimizer for a maximum of 100 epochs. The starting learning rate is set to 0.0003. We set the seed to 0 for reproducible results. Even though the neural networks are trained using multiprocessing (it is possible because the data was converted to a tensorflow dataset) it might take some time until the training ends.

To make the predictions of the models we define a function that takes the number of predictions that we want to average. We store the average and the standard deviations of the predictions and the first prediction is also stored for comparison.

Each day we take the sum of the absolute of the predictions and divide each prediction by this number, thus the sum of these weighted predictions is one.

Finally, we compute the trading costs of 0.01% and define a few lambda functions to compute all the results.

For comparison we also take the data of the index in a variable.

Finally, we define a function to compute the returns and plot its results.

Finally we plot the results of our models compared with the buy and hold on the index for the sigmas defined above.

The yearly sharpe ratio, sortino ratio and the information ratio are computed for each model, using the functions defined below. The risk free return is assumed to be 0 % and the benchmark for the information ratio is the return obtained through buy and hold in the Ibex 35, excluding commissions, during the test set.

We store the results of our models in a DataFrame

Finally we show the results of this DataFrame sorted by its sharpe ratio.

Saving data

It took a long time to generate our models and train them, let's keep the data we can keep in a pickle file.

4. Conclusions

By modelling the uncertainty through the standard deviation of the predictions it is possible to obtain better results than using a single prediction. In addition, a few models could generate positive returns in the test set, even when incorporating trading commissions.

Moreover, these models only used an input based on the raw percentage change of prices scaled and the addition of some statistical measures of the returns on the sequences of 22 days. However, other inputs such as fundamental data, or even some transformations of the data might produce much better results.

Furthermore, even though the convolutional layer performs better than the simple recurrent neural network cells, the simple RNN outperformed it with the filter of highest standard deviations (9 in this case). However, the convolutional model using 5 standard deviations as filter obtained a better positive information ratio while the GRU with the same filter produced a negative one.

To sum up, incorporating MC dropout for the predictions of deep learning has shown a great effect for predicting returns for the historical constituents of the Ibex 35.

Future lines of work

Most of the models lost money on the test set even while ignoring trading commissions, thus more informative inputs might produce much better results. In addition, given that the predicted returns were divided by the sum of the absolute of the predictions for that date, using sequences of data for all the stocks and predicting a relative value for each of them might yield better results.

Other methods for modelling uncertainty in predictions might be explored and compared with this methodology. In addition, performing more predictions under this methodology would produce a better estimation of the standard deviation of the predictions, although it requires high computing capacity, given that the predictions took more time than training the models.

Bibliography

Abe, M. and Nakayama, H., 2018. Deep Learning for Forecasting Stock Returns in the Cross-Section. In: D. Phung, V.S. Tseng, G.I. Webb, B. Ho, M. Ganji and L. Rashidi, eds. Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science. Cham: Springer International Publishing.pp.273–284. https://doi.org/10.1007/978-3-319-93034-3_22.

Baldi, P. and Sadowski, P.J., n.d. Understanding Dropout. p.9.

Brach, K., Sick, B. and Dürr, O., 2020. Single Shot MC Dropout Approximation. arXiv:2007.03293 [cs, stat]. [online] Available at: http://arxiv.org/abs/2007.03293 [Accessed 7 Mar. 2021].

Chen, S. and He, H., 2018. Stock Prediction Using Convolutional Neural Network. IOP Conference Series: Materials Science and Engineering, 435, p.012026. https://doi.org/10.1088/1757-899X/435/1/012026.

Clevert, D.-A., Unterthiner, T. and Hochreiter, S., 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv:1511.07289 [cs]. [online] Available at: http://arxiv.org/abs/1511.07289 [Accessed 31 Jan. 2021].

Fischer, T. and Krauss, C., 2018. Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), pp.654–669. https://doi.org/10.1016/j.ejor.2017.11.054.

Gal, Y. and Ghahramani, Z., 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. arXiv:1506.02142 [cs, stat]. [online] Available at: http://arxiv.org/abs/1506.02142 [Accessed 7 Mar. 2021].

Goodwin, T.H., 1998. The Information Ratio. Financial Analysts Journal, 54(4), pp.34–43. https://doi.org/10.2469/faj.v54.n4.2196.

Ioffe, S. and Szegedy, C., 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs]. [online] Available at: http://arxiv.org/abs/1502.03167 [Accessed 3 Feb. 2021].

Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M. and Inman, D.J., 2021. 1D convolutional neural networks and applications: A survey. Mechanical Systems and Signal Processing, 151, p.107398. https://doi.org/10.1016/j.ymssp.2020.107398.

Molchanov, D., Ashukha, A. and Vetrov, D., 2017. Variational Dropout Sparsifies Deep Neural Networks. arXiv:1701.05369 [cs, stat]. [online] Available at: http://arxiv.org/abs/1701.05369 [Accessed 8 Mar. 2021].

Nabipour, M., Nayyeri, P., Jabani, H., Mosavi, A., Salwana, E. and S., S., 2020. Deep Learning for Stock Market Prediction. Entropy, 22(8), p.840. https://doi.org/10.3390/e22080840.

Sayavong, L., Wu, Z. and Chalita, S., 2019. Research on Stock Price Prediction Method Based on Convolutional Neural Network. In: 2019 International Conference on Virtual Reality and Intelligent Systems (ICVRIS). 2019 International Conference on Virtual Reality and Intelligent Systems (ICVRIS). pp.173–176. https://doi.org/10.1109/ICVRIS.2019.00050.

Sharpe, W.F., 1994. The Sharpe Ratio. The Journal of Portfolio Management, 21(1), pp.49–58. https://doi.org/10.3905/jpm.1994.409501.

Sortino, F.A. and Price, L.N., 1994. Performance Measurement in a Downside Risk Framework. The Journal of Investing, 3(3), pp.59–64. https://doi.org/10.3905/joi.3.3.59.

References

Uncertainty in Neural Networks? Monte Carlo Dropout